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Abstract — The importance of stochasticlty within biological 
systems has been shown repeatedly during the last years and has 
raised the need for efflcient stochastic tools. We present SABRE, 
a tool for stochastic analysis of biochemical reaction networks. 
SABRE implements fast adaptive uniformization (FAU), a direct 
numerical approximation algorithm for computing transient 
solutions of biochemical reaction networks. Biochemical reactions 
networks represent biological systems studied at a molecular 
level and these reactions can be modeled as transitions of a 
Markov chain. SABRE accepts as input the formalism of guarded 
commands, which it interprets either as continuous-time or as 
discrete-time Markov chains. Besides operating in a stochastic 
mode, SABRE may also perform a deterministic analysis by 
directly computing a mean-Held approximation of the system 
under study. We illustrate the different functionalities of SABRE 
by means of biological case studies. 

I. Introduction 

Markov chains are an omnipresent modeling approach in the 
applied sciences. Often, they describe population processes, 
that is, they operate on a multidimensional discrete state space, 
where each dimension of a state represents the number of 
individuals of a certain type. Depending on the application 
area, "individuals" may be customers in a queuing network, 
molecules in a chemically reacting volume, servers in a 
computer network, actual individuals in a population, etc. 

Here, we are particularly interested in dynamical models of 
biochemical reaction networks, such as signaling pathways, 
gene expression networks, and metabolic networks. Biochem- 
ical reaction networks operate on an abstraction level where 
a state of the system is given by an n-dimensional vector of 
chemical populations, that is, the system involves n different 
types of molecules and the i-th element of the state vector 
represents the number of molecules of type i. Molecules 
collide randomly and may undergo chemical reactions, which 
change the state of the system. Classical modeling approaches 
in biochemistry are based on a system of ordinary differential 
equations that assume a continuous deterministic change of 
chemical concentrations. During the last decade, stochastic 
analysis of biochemical reaction networks has seen growing 
interest because it captures molecular noise [131 , which arises 
from the randomness of the discrete events in the cell. Molec- 
ular noise is of interest because it significantly influences 
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fundamental biological processes such as gene expression 
| fT2l 135], decisions of the cell fate||2l |29l, and circadian 
oscillations 13] [161 ■ 

Within the setting of stochastic analysis, biochemical reac- 
tion networks are modeled as discrete-state continuous-time 
Markov processes(CTMC) as suggested by Gillespie within 
the theory of stochastic chemical kinetics (14]. The evolution 
of a CTMC is given by a system of linear ordinary differential 
equations, known as the chemical master equation (CME). A 
single equation in the CME describes the time-derivative of 
the probability of a certain state at all times i > 0. Thus, 
the solution of the CME is the probability distribution over 
all states of the CTMC at a particular time t, that is, the 
transient state probabilities at time t. The solution of the 
CME is then used to derive measures of interest such as the 
distribution of switching delays f30l, the distribution of the 
time of DNA replication initiation at different origins |31| , or 
the distribution of gene expression products ll37l . Moreover, 
many parameter estimation methods require the computation 
of the posterior distribution because means and variances do 
not provide enough information to calibrate parameters [19j. 

Statistical estimation procedures such as Monte Carlo sim- 
ulation are widely used to estimate the probability distribution 
of the underlying Markov process, because for realistic sys- 
tems the size of the CME is very large or even infinite, and 
numerical methods become infeasible. Several tools for Monte 
Carlo simulation have been developed ifTSl l25l [32]| . Recent 
work, however, indicates that numerical approximation meth- 
ods for the CME can be used to compute the transient state 
probabilities more accurately and, depending on the measures 
of interest, with shorter running times |9|. Especially if the 
probabilities of interest are small, numerical approximations 
turn out to be superior to Monte Carlo simulation, because 
the later requires a large number of simulation runs in order 
to bound the statistical error appropriately. For estimating 
event probabilities, a higher precision level is necessary than 
for estimating cumulative measures such as expectations, and 
simulation based methods have a slow convergence because 
doubling the precision requires four times more simulation 
runs to be performed. 

In the case of discrete-time Markov chains (DTMCs), the 
transient stochastic analysis gives the probability distribution 
over all states of the DTMC after k steps. For population 



models, a step is interpreted as a triggered transition. The 
transient solution for DTMCs is the result of k matrix-vector 
products, and can be used for the solution of CTMCs, as shown 
later in Section IIII-AI 

Numerical analysis tools for discrete-state Markov processes 
such as PRISMI28J, INFAMYilTJ, ETMCC|22J, MRMCI26J, 
APNNtoolboxlia, SHARPEim, SPNPEI, or MobiuslH 
have been introduced (see Section |VII| l. However, except for 
INFAMY, these tools do not accept models with possibly 
infinite state space. It is important to note that many population 
models have an infinite state space, that is, the number of 
reachable states is infinite. Even when in the real system 
the number of molecules, or more generally, individuals is 
finite, no a priori bound is known, and models do not include 
any constraints on the number of molecules, for example in 
production rules such as ^ A. Another issue is that existing 
tools usually implement algorithms that are not optimized 
specifically for population models, and do not scale well on 
such models. 

SABRE is a tool for the transient analysis of Markov 
population models. In other words, SABRE analysis discrete- 
time, or continuous-time Markov processes that have a struc- 
tured discrete state space and state-depended rate functions. In 
Section In] we give more details on the space structure and the 
state dependency of rate functions that are present in Markov 
processes that represent population models. 

SABRE offers both stochastic and deterministic analysis 
of population models. For stochastic analysis, SABRE imple- 
ments three algorithms: standard uniformization, fast adaptive 
uniformization and Runge-Kutta fourth order method. The 
different configurations in which SABRE may operate are 
depicted in Figure [2] The focus of the tool is on the fast 
adaptive uniformization method, while the remaining methods 
are given for completeness and comparison. 

Fast adaptive uniformization is a variant of the uniformiza- 
tion method|34, 36 1 which is, an efficient method to compute 
probability distributions if the number of states of the Markov 
process is manageable. However, the size of a Markov process 
that represents a biochemical reaction network is usually far 
beyond what is feasible. 

Fast adaptive uniformization|10| improves the original uni- 
formization method at the cost of a small approximation error 
The main ideas for this improvement are the on-the-fly con- 
struction of the state space and the restriction imposed on the 
state space to contain only states with significant probabilities, 
e.g. states that have a probability larger than 10^^^. Even 
though fast adaptive uniformization can treat larger models 
than the previous uniformization methods could, as expected, 
models with remarkably high expected populations remain 
unsolvable and should be studied using deterministic analysis 
of simulation tools. A second down side of fast adaptive 
uniformization is that, due to the approximation error, it can 
overlook rare events of the model, e.g. events that occur with 
a very small probability. 

SABRE is available on-line at 

[hFtp : //mtc .epfl .ch/'matee scu/sabre ; First, the user gives 



an input model (either in SBML format or in guarded 
commands format) and a time horizon and than the transient 
analysis of the system starts (see Figure fTJ. More details on 
the usage of the tool are given in Section |W] 

II. Guarded Commands 

Guarded-command models (GCM) is the input formalism 
of SABRE. GCMs are a textual description of processes 
and are given in the style of Dijkstra's guarded-command 
languageyjj. Their syntax has subsequently been used by 
languages such as Reactive Modules [ 1 1 and by the language 
for specifying PRISM modelsl33|. The basic unit within 
GCMs is a transition class, which is expressed as a guarded 
command that operates on the state variables of the system. 
A transition class encodes for a possibly infinite number of 
state transitions. Within population models, the state variables 
of the system are non-negative integers representing numbers 
of molecules for each species. A guarded command takes the 
form 

guard | - rate -> update 

where the guard is a Boolean predicate over the variables that 
determines in which states the corresponding transitions are 
enabled. The update is a rule that describes the change of the 
system variables if the transition is performed. Syntactically, 
update is a list of statements, each assigning to a variable 
an expression over variables. Assume that x is a variable. If, 
for instance, the update rule is that x is incremented by 1, we 
write X : =x+l. We assume that variables that are not listed in 
the update rule do not change if the transition is taken. Each 
guarded command also assigns a rate to the corresponding 
transitions, which is a function on the state variables. Within 
SABRE, rate is given in infix notation. In the case of 
population models, the update function is incrementing or 
decrementing each variable by a constant integer. 

For a population model with m reactions, the GCM de- 
scription is a set of m guarded commands, which we index as 
guardj h rate^ -^ update^, where each of the commands 
j, with 1 < j < m, describe the j-th reaction of the model. 

GCMs are used to express both CTMCs and DTMCs. The 
difference between the two interpretations comes from the 
semantic given to the rate function of each command. In the 
case of CTMCs, for a given reaction j, the rate function rate^ 
assigns to each state s, a positive real value that represents the 
rate of the outgoing transition j. 

In the case of DTMCs, the rate function rate^ assigns to 
each state s, a positive real value that represents the transition 
probability from state s to its successor on reaction j. The 
functions ratej must define probability distribution over the 
direct successor state, that is, for each state s we impose that 
Sj ratej{s) = 1. If the input is not given in this manner, 
SABRE will automatically normalize the rate functions such 
that the probability distribution condition to be fulfilled. Note 
that this is equivalent to interpreting the input as a CTMC and 
than considering its embedded DTMC. 
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Simple toggle switch example 
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GCMs are used to model systems that exhibit a finite num- 
ber of transition types, but possibly an unbounded number of 
states. For example, in a computer network, the number of type 
of events is finite (send message, receive message, add node, 
etc.) but the number of states is countably infinite, because 
it depends on the number of nodes in the network and on 
the number of requests each of them has. The same holds for 
biochemical reaction networks, each reaction type generates a 
transition class, but the number of states is countably infinite, 
as we do not have any a-priori bound on the variables of 
the system, due to productions rules of the type -> A. 
We therefore conclude that GCMs are a natural formalism 
for describing population models ll20l 

Example 2.1: The bistable toggle switch is a prototype of 
a genetic switch with two competing repressor proteins and 
four reactions. We call the species A and B and we let x = 
(xj^jXe) S Ng be a vector describing a state of the system. 
The reactions are given in Table 11] 

III. Stochastic and Deterministic Analysis 

SABRE performs a transient analysis of the input system, 
that is, SABRE computes the state of the system at time t 
given the state of the system at time 0. SABRE may execute 
either a stochastic analysis or a deterministic analysis of the 
input system; and in the first case the state of the system 
at time t is actually given as a probability distribution over 
the discrete states of the system. The second type of analysis 
-the deterministic analysis- is done over a continuous state 
space, and its result is a single state of this continuous space. 
The result of the deterministic analysis, also known as mean 
field analysis, is an approximation of the expectation of the 
stochastic analysis. Each of the two analysis (stochastic and 
deterministic) may be applied on each of the two semantics 
(CTMC and DTMC), and we will now give short interpreta- 
tions for the results of the four possible combinations. 

A. Stochastic Analysis 

CTMC semantics: We note that the behavior of the 
CTMC is described as a differential equation (known in 
biochemistry as the chemical master equation) and that p{t) is 
the solution of that differential equation at time t. The transient 
stochastic analysis at time t, given the initial state yo with 
probability 1, computes the solution of the chemical master 
equation at time t. Within SABRE, the solution p{t) may be 
computed either by uniformization or by Runge-Kutta explicit 
fourth order method. 

We focus on two uniformization methods for CTMCs, 
standard uniformization ll34l and on its generalization called 
adaptive unifomization f36l. Standard uniformization splits the 



given CTMC into a discrete-time Markov chain (DTMC) and 
a Poisson process, whereas adaptive uniformization splits the 
CTMC into a DTMC and a birth process. SABRE implements 
the optimized algorithm called fast adaptive uniformization 
that has previously been proposed fTOl. One main strength of 
this algorithm is that it closely tracks the set of significant 
states of the state space, where by a significant state we mean 
a state with significant probability. Secondly, another strength 
of the fast adaptive uniformization lies in the on-the-fly 
construction of a non-explicit matrix used in the computation 
of the solution of the DTMC (remember that uniformization 
splits the given CTMC into a DTMC and a birth process). 

DTMC semantics: DTMC semantics are to be used when 
the number of triggered transitions, rather than the elapsed 
time, is of interest. Such situations may arrive, for example, in 
population genetics models or as a part of the uniformization 
method. A transient analysis of a DTMC consists in a series 
of matrix- vector products: 

p{k + I) = p{k) ■ P, 

with P being the probability transition matrix of the DTMC 
and p{k) being a row vector representing the probability 
distribution after k steps. In our algorithm! 10], the main phase 
of the DTMC transient analysis is called the propagation 
phase. The propagation phase completes the equivalent of a 
matrix-vector product by moving probability mass from on 
state X to all direct successors of x (including x itself if any 
self-loops are present). SABRE approximates the probability 
distribution over the states of the system after k reactions 
have happened, given the state yo of the system before any 
reaction happens. Formally, SABRE approximates the vector 
p{k) — Syg ■ P'', where Sy^ is a dirac probability distribution 
in point ijq. 

B. Deterministic Analysis 

CTMC semantics: We can give an approximate solution 
of the mean field of the CTMC by using the forth order 
Runge-Kutta method to solve a set of ordinary differential 
equations simpler than the CME. This set of equations are 
known as the reaction rate equations U_5J and express the 
change in the expectation of each variable over time. In the 
thermodynamic limit (that is, the number of molecules and the 
volume of the system approach infinity) the Markov model and 
the macroscopic ODE description are equal ETl . Therefore, 
for large populations, the deterministic analysis can be used 
to approximate the mean field of the CTMC. 

DTMC semantics: As in the case of CTMCs, for comput- 
ing the first moment of the transient solution of a DTMC, we 
can directly solve a simpler set of equations that are written 
directly over variables that represent the expectancies of the 
stochastic solution of the input DTMC model lIZTl . 

The expected number of molecules changes determinis- 
tically over discrete time, as described by the following 
equation: 

x{k + l) = x{k) -A, 



where A is a probability matrix, and each of its entries a^ ^ 
give the probability for a species i to modify into a species 
j. Such analysis are useful for discrete-time models as those 
used to validate communication protocols. 

IV. Tool Interface 

From the tool's interface, we have several ways of selecting 
a model for analysis. One can load an existing model, upload 
an SBML file or introduce a GCM text description of the 
system to analyze. SBML is a standardized format for rep- 
resenting models of biological processes, such as metabolism 
or cell signaling and is the input to SABRE's core program. 
GCMs that have update functions with constant increment (or 
decrement) have a straight forward translation to SBML. 

Example 4.1: We continue the toggle switch example with 
its SBML description. For brevity, we only give one reaction 
of the model. We observe that the rate function is not restricted 
to a particular template and is written following the mathML 
standard. 

<sbml . . . > 

1 <model> 
1 . . . 

3 <listOf Species> 

4 <species id="A" initialAmount="133"/> 

5 <species id="B" initialAmount="133"/> 

6 </listOfSpecies> 

7 <listOf Reactions> 

8 <reaction id="Rl"> 

9 <listOfProducts> 

10 <speciesRef erence species="A"/> 

11 </listOfProducts> 

12 <listOfModifiers> 

13 <speciesRef erence species="B"/> 

14 </listOfModifiers> 

15 <kineticLaw> 

16 <math . . .> 

17 <apply> <divide/> 

18 <ci> cl </ci> 

19 <apply> <plus/> 

20 <ci> c2 </ci> 

21 <apply> <times/> 

22 <ci> B </ci> 

23 <ci> B </ci> 

24 </apply> 

25 </apply> 

26 </apply> 

27 </math> 

28 <listOfParameters> 

29 <parameter id="cl" value="3000"/> 

30 <parameter id="c2" value="11000"/ 

31 </listOfParameters> 

32 </kineticLaw> 

33 </reaction> 
34 

35 </listOfReactions> 

36 </model> 



36 </sbml> 

Once the model is chosen, we choose a configuration of 
the analysis by choosing the semantics, the mode and, if 
needed, the type of stochastic solution. Finally, we choose 
a time horizon, or the number of steps for which we want the 
system to run. We also give as an input a dump time td, which 
corresponds to the intermediate results, that is, the system 
will compute the distributions for tci,2 -t^,- ■ -t. The program 
computes the intermediates and the final results which are then 
dynamically plotted for each species, as the computation runs 
(see Figure [T]|. If the uniformization method is selected, the 
user also needs to provide an estimate of the maximal exit 
rate over all reachable states. If the estimate is too small, the 
compuation needs to be restarted, and if the estimate is too 
large, the computation is likely to take longer. It is standard 
uniformization which is especially touched by choosing a too 
large upper bound on the maximal exit rate. Estimating this 
upper bound by heuristics such as those used for the sliding 
window algorithm ll2Tl is an on going work. 

V. Software Architecture 

SABRE is available on line, assuring this was a fast and 
portable release of our implementation. The core of our tool 
is implemented in C++, while the website that hosts it is 
implemented using PHP and Javascript. The user provides 
the desired input through the web interface, than a query is 
generated to the 3GHz Linux machine on which SABRE is 
installed. The server sends back to the user intermidiate results 



which are then plotted as we show in Section IV 



A. Components 

SABRE's different components are activated as shown in 
Figure [2] Depending on the chosen semantics, analysis mode 
and, if necessary, stochastic solution type, SABRE calls the 
coresponding method. Some of the functionalities are shared 
among different methods, for example the DTMC solution is 
accessed either directly from choosing the DTMC semantics, 
either indirectly, by the uniformization algorithm. As well, 
Runge-Kutta method, is used both as a solver of the CME 
or as the solver of the reaction rate equations. 

B. Data Structure 

We present an efficient data structure used by SABRE when 
used in stochastic analysis mode. SABRE's main focus is 
on a fast implementation of the fast adaptive algorithm, so 
we will use this algorithm to motivate the choice of our 
data structure. However, the same kind of reasoning works 
if one wants to optimize the Runge-Kutta implementation. 
>The most computationally demanding part of fast adaptive 
uniformization is the probability propagation phase, which 
performs the equivalent of one matrix-vector product in a 
DTMC. We therefore need a data structure that is efficient 
during this step. 

First, we mention that, for each state, along with the state 
description, we need to record additional information about 
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Fig. 1. Dynamic update of the plots witliin the web interface. The evoluation of the probability distribution over the number of monomers in Goutsias' 
model, as given in UQJ . 
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Fig. 2. Software architecture. Depending on the selected semantics, analysis 
mode and, eventually, type of stochastic solution, SABRE computes the 
desired results. The vector p{t) is the transient probability distribution after 
time t, while the vector p{k) is the transient probability distribution after k 
steps. For the deterministic analysis, the values rn{t) and m(fc) coirespond 
to the mean field of the corresponding CTMC, respectively DTMC. The value 
A represents the maximum exit rate of the CTMC and is required only by 
uniformization. 



the probability of the state, about its successors, and about the 
rates/probabilities of the reactions that lead to those respective 
successors. We gather all this information in a structure called 
node. During the propagate phase we iterate over all nodes of 
the state space, and for each node we move probability mass 
along all of its outgoing transitions. Note that, initially the state 
space has a single state, and that states are dynamically added 
to the state space as they are discovered. That is, some of the 
direct successors of n may be newly discovered, and in this 
case they are added to the state space data structure. Therefore, 
ideally, the data structure used for storing information about 
the state space would have the following characteristics. 

• [Fast sequential access.] For enumerating all nodes. We 
note that this is a property of the array primitive type of 
most programming languages. 

• [Fast search.] For quickly finding the successors of a 
node. We note that this is a property of map or hash 
type of many programming languages. 

• [Fast add.] For dynamically adding newly discovered 



states to the current state space. 

• [Fast delete.] For dynamically removing states that have 
close to zero probability. 

We summarize the comparison between arrays and hashes 
in Table |ll] Arrays allow fast sequential access, fast add 
but slow search and delete operations. Hashes allow fast add, 
delete, search, but slow sequential access. We propose a hybrid 
solution that has the advantages of each data structure at the 
expense of extra memory usage. 

Our hybrid data structure is composed of: 

• array nodes that acts as a function from index -^ node 

• hash index that acts as a function from state — > index 

• vector inactive_nodes of indices of nodes that have 
become inactive as a result of a delete. 

This mixture of structures lets us give fast implementations 
for each of the required operations: 

• Sequential access Simple iteration over the elements of 
nodes. 

• Search Search within index followed by an access in 
nodes. 

• Delete state The nodes array is allocated statically, 
so physically erasing a node would be expensive. The 
alternative is to mark the node for deletion by inactivating 
it -setting its probability to zero- and adding it to the 
inactive_nodes vector. Because of their zero prob- 
ability, inactive nodes are ignored when iterating over all 
states. An inactive node has two possible futures: either 
it will be reoccupied by a newly added state, either it will 
be deleted during a compress phase. The compress phase 
is initiated when the number of inactive nodes covers 
more then 20% of the number of both active nodes and 
inactive nodes and it consists of eliminating all inactive 
nodes and rearranging the active nodes in a contiguous 
region. 

• Add state When we add a state to the state space, we 
need to assign it to a node within the nodes array. 
The nodes array is allocated statically and during the 
program's initialization phase, it is initialized to 2^*^ free 
nodes. When we add a new state, if inactive_nodes 
is non-empty, that is, if an inactive node exist, assign 
the state to this node, which now becomes active. If 
inactive_nodes is empty, we check whether we still 
have free allocated nodes, that is, we check whether the 



TABLE II 

Data structure comparison 



Data Structure 


Sequential access 


Search 


Add 


Delete 


Arrays 


fast 


slow 


fast 


slow 


Maps 


slow 


fast 


fast 


fast 


Hybrid solution 


fast 


fast 


complex, but fast 


complex, but fast 
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Case Studies Summary 
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Model 


Time 
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States 


Stochastic 


Exclusive switch 


94s 
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3047 


Detemiinistic 


Enzymatic reaction 


< Is 


- 


1 


Stochastic 


Moran's model 


49s 
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number of active nodes has reached the size allocated 
to nodes. If free nodes exist, we assign the new state 
to a free node, if free nodes do not exist we need to 
allocate extra 2^*^ nodes to nodes and then pick a 
newly created free node. We note that the reallocation 
operation is expensive but happens only rarely, e.g. when 
the state space first reaches one million, two millions, 
three milUons states and so on. 

VI. Case Studies 

We present case studies for stochastic and deterministic 
analysis of CTMCs and for the stochastic analysis of DTMCs. 
For more and larger experiments on stochastic analysis of 
CTMCs we refer the reader to the paper giving the fast 
adaptive uniformization algorithmfTOl. All our experiments are 
performed on a 3GHz Intel Linux PC, with 6 GB of RAM. 
We give the results of our experiments in Table |lll] 

A. Genetic exclusive switch 

The exclusive genetic switch we analyze involves two 
species of proteins that may bound to the same promoter site. 
We denote the unbounded proteins by A'^i and N2 and the 
bounded ones by ri and r2||4l. The guarded commands for this 
model are given in Table llV] The rate functions are evaluated 
for the state {x^^ , x^ , x^^ , x,.^ ), where xn^ is the number of 
molecules of type A^i and so on. 

When it is bounded to the promotor site, a protein represses 
the production of the other protein. And so, for example, 
production of A^i only happens if no N2 molecule is bounded 
to the promoter site (see rate function of first reaction). iVi or 
N2 may bound only to a free promotor site (see rate functions 
of the third and seventh reaction). Note that it always holds 

that Xri + Xr2 < 1- 

We run the system from initial state (25, 0, 0, 0) for a period 
of time of 10000 units with constants: 51 = (72 = 0.05, di = 
^2 — 0.005,^1 = 62 = 0.1, ui — U2 = 0.005, and present the 
solution in Figure [3] 

B. Enzymatic reaction 

We use enzyme-catalyzed substrate conversion to exemplify 
how to perform a deterministic analysis under continuous- 
time semantics. The enzymatic reaction is described by three 
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Fig. 3. Exclusive switch at time 10000. The x-axis gives the number of A^i 
molecules and the y-axis gives the number of N2 molecules. Each point of the 
plot corresponds to the states of systems that have the have the corresponding 
number of Ni and N2 molecules. The darker the point is, the more probability 
mass it holds. We can notice the bistable behaviour from the two regions of 
black points, one for A'^i = and one for N2 = 0. 



reactions (see Table [Vji, that involve four chemical species, 
namely, enzyme {E), substrate (S), complex (C), and product 
(P) molecules. The state of the system is described by the 
vector {xE,xs,xc,xp), which gives the existing number of 
molecules of each type. 

For our experimental results, we chose the same parameters 
as in ISl, that is, initial state y = (1000, 100, 0, 0), time 
horizon t = 70, and rate constants ci = C2 = 1 and C3 — 0.1. 
For the case deterministic analysis we can not give any error 
bounds, as shown in Table lin] 



C. Moran 's population model 

As a simple example of how SABRE operates on DTMC 
models we choose Moran's genetic population model, which 
can be seen as a set of biochemical reactions, more specifically 
as one reversible reaction. 

For a population of N individuals, with two alleles, Ai 
and A2, we are interested to find the probability of fixation 
of Ai, that is, the probability for Ai individuals to be equal 
to N after a certain time. We have two reactions: A2 — J> Ai 
and Ai — > A2. For xa^ individuals with Ai allele and xa^ 
individuals with A2 allele, the probability of the first reaction 
isi^+s ^ 
reaction, its probability is ^^ + s 

We choose TV = 1000 and s = 2e — 3, the initial state 
of XAi ~ 1 and we perform a transient analysis until time 
k = 10^, at this time, the probabihty of fixation is 0.00049. 
In this case the error we obtain is because no cutting is 
performed, the state space is kept at its complete size of 1001. 



^J- where s is a small constant. As for the second 

N ■ 



TABLE IV 

Genetic exclusive switch 



Reaction 


Guarded Command 


Description 


^ Ni 


true 


hgi ■ (l-Xr^) 


^ xjvi 


= XNi + 1 


Production of A^i 


JVi ^0 


a;jvi >0 


h di ■ XiVi 


-5> xjvi 


= xjvi - 1 


Degradation of A^i 


ATi ^n 


X]Vi > 


h fei ■ (1 - Xri - Xrj) 


^> xjvi 


= xjvi - l;a;ri 


:= Xri + 1 


Binding of A''i 


ri -^ Ni 


Xr-, > 


h Ml ■ Xri 


-^ XNi 


= a;jvi +l;xri 


:= Xr, - 1 


Unbinding of A''i 


-^N2 


true 


1-92 ■ (1- Xr-^) 


-> a:jv2 


= XJV2 + 1 


Production of N2 


N2 ^9 


Xff^ > 


h ^2 ■ a^'iVa 


-^ XN2 


= a;jv2 - 1 


Degradation of N2 


N2 -s- r2 


XiVa > 


h 62 ■ (1 — 2^ri — I^ra) 


-5- XJV2 


^ ^iV2 -'^i "^^2 


:= Xrj + 1 


Binding of N2 


r2 — > A''2 


Xr2 > 


h 1/2 ■ a;r2 


-5> XJV2 


= xn2 +I;a:r2 


:= Xr2 - 1 


Unbinding of N2 



TABLE V 
Enzymatic reaction 



Reaction 


Guarded Command 


Description 


E + S^C 


XE > and xs > 


h ci-XE-xs 


-^XE 


= XE - l;xs 


= xs - l;xc 


= xc + l 


Formation of complex 


C -^ E + S 


xc > 


h C2 ■ xc 


— >• XE 


= XE + l;xs 


= xs + l;xc 


= xc — I 


Dissociation of complex 


C^E + P 


xc > 


h C3-XC 


-^XE 


= XE + l;xc 


■.= xc - l;xp 


■.= xp + l 


Product production 



VII. Comparison with other tools 

Several tools for stochastic analysis of Markov chains have 
been developed by communities such as probabilistic veri- 
fication, computational biology and performance evaluation 
among others. Here, we provide a comparison with the tools 
that are the closest to SABRE. The PRISM tool |28|, which 
is widely used in probabilistic verification, considers a more 
general class of Markov processes than population models. 
For instance, it does not restrict the update function such that 
it allows only a constant change of the state variables. The 
models addressed by PRISM are less structured and typically 
they do not have state dependent rate functions. PRISM uses 
powerful minimization techniques such as bisimulation that do 
not result in significant reductions in the case of population 
model. PRISM requires that upper bounds on the state vari- 
ables are given as an input by the user. As opposed to that 
the SABRE tool finds appropriate bounds automatically and 
avoids an exhaustive state space exploration. The drawback 
is that the SABRE tool cannot guarantee the validity of 
properties such as "Is the probability to reach state x within 
t time greater than p?" but gives an approximate solution. As 
opposed to that PRISM can guarantee such properties. On the 
other hand, since SABRE avoids an exhaustive state space 
exploration it is able to handle much larger models with state- 
dependent rates. Infamy is a model-checking tool for infinite- 
state CTMCs by Zhang et al. |17|. Depending on the desired 
precision, their algorithm simply explores the reachable states 
up to a finite path depth. In contrast, our approach takes into 
account the direction into which the probability mass moves, 
and constructs a sequence of abstract models "on-the-fly," 
during the verification process. Similar approaches have also 
been used in the context of biochemical reaction networks |6l . 

Other tools for stochastic analysis of Markov chains, such 
as ETMCC|22|, MRMCEH, APNNtoolbox|5|, SHARPEES), 
SPNP|24|, and MobiusfSl, are conceived for answering per- 
formance analysis questions and as PRISM, due to their 
exhaustive state space exploration can not be applied to infinite 
models. 



Dizzyt'32|, SnoopyfTSl and Copasi|25 1 are tools for stochas- 
tic simulation alone and not do not compute probability 
distributions over states. |9| Bio-PEPA|J21 is a language for 
modeling and analysis of biochemical networks. For numerical 
analysis and verification problems Bio-PEPA uses PRISM's 



engine. 



VIII. Conclusion 



We have introduced SABRE, a tool for stochastic analysis 
of biochemical reaction networks and of population models in 
general. We have motivated the choice of guarded commands 
as input formalism for our tool and the need for a stochastic 
analyzer specialized on biological systems. SABRE currently 
has the form of an accessible web tool, which was chosen 
out of the need to deliver our algorithms and optimizations in 
a fast and portable way. However, an offline version release 
is planned for the future. For completeness and comparison, 
SABRE also performs deterministic analysis of the input 
system. 
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